The Prior Influence Function in Variational Bayes

نویسندگان

  • Ryan Giordano
  • Tamara Broderick
  • Michael Jordan
چکیده

In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. One hopes that the posterior is robust to reasonable variation in the choice of prior, since this choice is made by the modeler and is often somewhat subjective. A different, equally subjectively plausible choice of prior may result in a substantially different posterior, and so different conclusions drawn from the data. Were this to be the case, our conclusions would not be robust to the choice of prior. To determine whether our model is robust, we must quantify how sensitive our posterior is to perturbations of our prior. Variational Bayes (VB) methods are fast, approximate methods for posterior inference. As with any Bayesian method, it is useful to evaluate the robustness of a VB approximate posterior to changes in the prior. In this paper, we derive VB versions of classical non-parametric local robustness measures. In particular, we show that the influence function of Gustafson (2000) has a simple, easy-to-calculate closed form expression for VB approximations. We then demonstrate how local robustness measures can be inadequate for non-local prior changes, such as replacing one prior entirely with another. We propose a simple approximate non-local robustness measure and demonstrate its effectiveness on a simulated data set. 1 Local robustness and the influence function Bayesian robustness studies how changes to the model (i.e., the prior and likelihood) and to the data affect the posterior. If important aspects of the posterior are meaningfully sensitive to subjectively reasonable perturbations of the inputs, then the posterior is “non-robust” to these perturbations. In this paper, we focus on quantifying the sensitivity of posterior means to perturbations of the prior – either infinitesimally mixing or completely replacing the original prior with another “contaminating prior”. Our methods allow fast estimation of sensitivity to any contaminating prior without re-fitting the model. We follow and extend the work of Gustafson (1996) and Gustafson (2000) to variational Bayes and to approximate non-local measures of sensitivity. For a more general review of Bayesian robustness, see Berger et al. (2000). We will now define some terminology. Denote ourN data points by x = (x1, . . . , xN ) with xn ∈ R. Denote our parameter by the vector θ ∈ R . We will suppose that we are interested in the robustness of our prior to a scalar parameter where our prior can be written as p (θ| ). Let p denote the posterior distribution of θ with prior given by and conditional on x, as given by Bayes’ Theorem: p (θ) := p (θ|x, ) = p(x|θ)p(θ| ) p(x) . A typical end product of a Bayesian analysis might be a posterior expectation of some function, Epx [g (θ)], which is a functional of g (θ) and p x (θ). Local robustness considers how much Epx [g (θ)] changes locally in response to small perturbations in the value of (Gustafson, 2000). In the present 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. work, we consider mixing our original prior, p0 (θ), with some known alternative functional form, pc (θ): p (θ| ) = (1− ) p0 (θ) + pc (θ) for ∈ [0, 1] . (1) This is known as epsilon contamination (the subscript c stands for “contamination”), and its construction guarantees that the perturbed prior is properly normalized. The contaminating prior, pc (θ) need not be in the same parametric family as p0 (θ), so as pc (θ) ranges over all possible priors, equation (1) represents an expressive class of perturbations. Under mild assumptions (given in section §A), the local sensitivity measure at the prior p (θ| ) given by a particular is Sc = dEpx [g (θ)] d ∣∣∣∣ = Covpx ( g (θ) , pc (θ)− p0 (θ) p0 (θ) + (pc (θ)− p0 (θ)) ) . (2) The definition in equation (2) depends on a choice of pc (θ), which we denote with a superscript on Sc . At = 0, we recover the local sensitivity around p0 (θ), which we denote S pc 0 . Rather than choose some finite set of pc (θ) and calculate their corresponding Sc , one can work with a single function that summarizes the effect of any pc (θ), called the “influence function” (Gustafson, 2000). Observing that equation (2) is a linear functional of pc (θ) when g (θ), , and p (θ| ) are fixed, the influence function (when it exists) is defined as the linear operator I (θ) that characterizes the dependence of Sc on pc (θ): Sc = ∫ I (θ) pc (θ) dθ where I (θ) := p (θ) p (θ| ) ( g (θ)− Epx [g (θ)] ) . (3) At = 0, we recover the local sensitivity around p0 (θ), which we denote I0 (θ). When perturbing a low-dimensional marginal of the prior, I0 (θ) is an easy-to-visualize summary of the effect of sensitivity to an arbitrary pc (θ) using quantities calculated only under p0 (θ) (see the example in section §4 and the extended discussion in Gustafson (2000)). Additionally, the worst case prior in a suitably defined metric ball around p0 (θ) is a functional of the influence function, as shown in Gustafson (2000). 2 Variational approximation and linear response We now derive a version of equation (2) for Variational Bayes (VB) approximations to the posterior. Recall that an variational approximate posterior is a distribution selected to minimize the KullbackLiebler (KL) divergence to p across distributions q in some class Q. Let q denote the variational approximation to posterior p . We assume that distributions in Q are smoothly parameterized by a finite-dimensional parameter η whose optimum lies in the interior of some feasible set Ωη . We would like to calculate the local robustness measures of section §1 for the variational approximation q , but a direct evaluation of the covariance in equation (2) can be misleading. For example, a common choice of the approximating familyQ is the class of distributions that factorize across θ. This is known as the “mean field approximation” (Wainwright and Jordan, 2008). By construction, a mean field approximation does not model covariances between independent components of θ, so a naive estimate of the covariance in equation (2) may erroneously suggest that the prior on one component of θ cannot affect the posterior on another. However, for VB approximations, we can evaluate the derivative on the left hand side of equation (2) directly. Using linear response variational Bayes (LRVB) (Giordano et al., 2016, 2015), we have d d Eqx [g (θ)] ∣∣∣∣ = ∫ q (θ) p (θ| ) qη (θ) T Hgηpc (θ) dθ (4) where gη := ∂Eqx [g (θ)] ∂η , qη (θ) := ∂ log q (θ; η) ∂η , and H := ∂KL (q (θ; η) ||p ) ∂η∂ηT . It follows immediately from the definition in equation (3) that we can define the variational influence function I (θ) := q (θ) p (θ| ) qη (θ) T Hgη (5)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayes, E-Bayes and Robust Bayes Premium Estimation and Prediction under the Squared Log Error Loss Function

In risk analysis based on Bayesian framework, premium calculation requires specification of a prior distribution for the risk parameter in the heterogeneous portfolio. When the prior knowledge is vague, the E-Bayesian and robust Bayesian analysis can be used to handle the uncertainty in specifying the prior distribution by considering a class of priors instead of a single prior. In th...

متن کامل

BAYES ESTIMATION USING A LINEX LOSS FUNCTION

This paper considers estimation of normal mean ? when the variance is unknown, using the LINEX loss function. The unique Bayes estimate of ? is obtained when the precision parameter has an Inverse Gaussian prior density

متن کامل

Classic and Bayes Shrinkage Estimation in Rayleigh Distribution Using a Point Guess Based on Censored Data

Introduction      In classical methods of statistics, the parameter of interest is estimated based on a random sample using natural estimators such as maximum likelihood or unbiased estimators (sample information). In practice,  the researcher has a prior information about the parameter in the form of a point guess value. Information in the guess value is called as nonsample information. Thomp...

متن کامل

Bayesian Estimation of Shift Point in Shape Parameter of Inverse Gaussian Distribution Under Different Loss Functions

In this paper, a Bayesian approach is proposed for shift point detection in an inverse Gaussian distribution. In this study, the mean parameter of inverse Gaussian distribution is assumed to be constant and shift points in shape parameter is considered. First the posterior distribution of shape parameter is obtained. Then the Bayes estimators are derived under a class of priors and using variou...

متن کامل

Comparison of Estimates Using Record Statistics from Lomax Model: Bayesian and Non Bayesian Approaches

This paper address the problem of Bayesian estimation of the parameters, reliability and hazard function in the context of record statistics values from the two-parameter Lomax distribution. The ML and the Bayes estimates based on records are derived for the two unknown parameters and the survival time parameters, reliability and hazard functions. The Bayes estimates are obtained based on conju...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016